Incremental Privacy-Preserving Record Linkage (iPPRL) to Reduce Barriers to Data Sharing and Improve Data Quality [Methods Study], Colorado, 2011-2022 (ICPSR 39738)
Version Date: Mar 23, 2026 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Toan Ong, University of Colorado Anschutz Medical Campus
https://doi.org/10.3886/ICPSR39738.v1
Version V1
Summary View help for Summary
Researchers often have trouble collecting complete information on patient health, as patients may receive care at different places. Linking patient records from different places may help researchers get a more complete picture.
One way to link records is through personal information, such as names and birth dates. But this method increases risks to patient privacy. Another way, known as privacy-preserving record linkage, or PPRL, masks personal information. But current PPRL methods only work when linking entire sets of patient data, including data that have already been shared and linked. Linking entire data sets takes a long time. Also, sharing the same records multiple times increases data privacy risks.
In this study, the research team developed and tested a new PPRL method called incremental PPRL. This method links only new or updated data rather than re-linking entire data sets.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Geographic Coverage View help for Geographic Coverage
Distributor(s) View help for Distributor(s)
Time Period(s) View help for Time Period(s)
Date of Collection View help for Date of Collection
Study Purpose View help for Study Purpose
(1) To develop and implement a novel iPPRL method; (2) To compare iPPRL with existing linkage methods and validate its accuracy and effectiveness
Study Design View help for Study Design
The research team extended existing PPRL methods to develop a new iPPRL method. The method successively linked incremental data sets to an initial data set; linkage ended when no new data could be added. The team applied the iPPRL method to a simulated data set containing 115,000 records that mimicked real-world data quality issues.
Then, using real patient data, the research team compared the performance of the iPPRL method with two existing methods which require re-linking whole data sets. The team first linked data from five health systems in the Colorado Congenital Heart Disease registry. They manually reviewed the linked records to create a reference data set containing 4,940 linked records. Next the team linked the same records using the iPPRL method and the two existing methods. They compared the linkage results from the iPPRL and existing methods with the reference data set.
Patients, a patient representative, and researchers provided input throughout the study.
Data Source View help for Data Source
A simulated data set with 115,000 records Colorado Congenital Heart Disease registry data from 2011-2013 for 4,940 patients ages 11-64 during
Notes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.

This study is maintained and distributed by the Patient-Centered Outcomes Data Repository (PCODR). PCODR is the official data repository of the Patient-Centered Outcomes Research Initiative (PCORI).
